Introduction

Clustering is part of the unsupervised learning that divides data into some number of groups(clusters). It can be used in many applications like market segmentation, anomaly detection, medical imaging and many more. In this article I would like to perform clustering on data about apartment prices. In order to do that I will use data from National Bank of Poland about apartment prices divided by biggest polish cities. Analysis will firstly consist the search of the optimal number of clusters and then based on chosen number, clustering algorithms will be performed.

Exploratory data analysis

Reading libraries first.

library(cluster)
library(factoextra)
library(flexclust)
library(fpc)
library(clustertend)
library(ClusterR)
library(kableExtra)

library(data.table)
library(stringr)
library(tidyverse)
library(knitr)
library(ggthemes)
library(plotly)

Data about cities comes from NBP quarterly raports about real estates link. In order to make this research reproducible I copied whole dataset into folded R chunk below.

Reading data

data_1 <- fread("Białystok  Bydgoszcz   Gdańsk  Gdynia  Katowice    Kielce  Kraków  Lublin  Łódź    Olsztyn Opole   Poznań  Rzeszów Szczecin    Warszawa    Wrocław
3070,00 2530,00 4541,00 5756,00 2507,52 2424,92 7114,00 3160,00 2740,00 3414,43 3164,08 3751,81 2851,00 3189,80 7179,00 5260,77
3408,00 2784,00 5406,00 6496,00 2649,12 3127,42 7383,00 3562,00 3316,00 3925,00 3584,54 4461,66 3734,00 3856,59 8751,00 5856,74
3986,00 3607,00 6115,00 7211,00 3703,64 3903,90 8369,00 3986,00 4130,00 5049,13 3349,43 6104,06 4187,00 4734,75 9316,00 6746,54
4418,00 3932,00 6602,00 6747,00 4152,12 4120,78 8272,00 4699,00 4609,00 5352,28 4149,53 6698,11 4647,00 4958,81 9740,00 7038,09
4580,00 4150,00 6740,00 7188,00 4150,53 4230,50 8255,00 4815,00 4721,00 5394,15 4135,53 6386,66 4814,00 5093,84 10078,00    7193,98
4637,00 3956,00 6824,00 6917,00 4473,12 4299,54 8140,00 4962,00 4686,00 5169,65 4362,62 6471,78 4849,00 5297,29 9952,00 7265,66
4738,00 3685,00 6795,00 6984,00 4739,42 4528,46 7979,00 5141,00 4737,00 5199,61 4235,14 6235,31 4901,00 5124,49 9850,00 7138,06
4853,00 4202,00 6704,00 6854,00 4638,18 4478,01 7934,00 5071,00 4668,00 5014,25 4068,43 6096,79 4824,00 5179,89 9783,00 7158,82
4777,00 4151,00 6608,00 6960,00 4554,93 4573,55 7180,00 5111,00 4544,00 4964,12 4186,67 5979,68 4790,00 5174,22 9679,00 6921,94
4744,00 4240,00 6793,00 7059,00 4360,31 4356,19 7251,00 5091,00 4873,00 4946,72 4142,94 5953,70 4640,00 5079,09 10196,00    6859,45
4613,00 4095,00 6648,00 7104,00 4340,42 4467,06 6864,00 5062,00 4730,00 4730,22 4234,59 5978,52 4683,00 4965,16 9626,00 6784,57
4614,00 4017,00 6644,00 6949,00 4288,48 4286,91 6678,00 5017,00 4431,00 4710,13 4282,80 6216,01 4684,00 4860,20 10133,00    6748,86
4641,00 4244,00 6449,00 6901,00 4370,00 4498,77 6564,00 5010,00 4643,00 4733,41 4082,64 5970,06 4641,00 4964,27 9705,24 6841,82
4620,00 4203,00 6350,00 6991,00 4181,00 4433,91 6734,00 4977,00 4400,00 4719,66 4154,50 5924,45 4681,00 4834,07 9671,30 6757,76
4729,00 4202,00 6465,00 7032,00 4136,00 4604,93 6916,00 5028,00 4558,00 4763,00 4166,15 6073,87 4611,00 4867,64 9901,01 6725,00
4709,00 4270,00 6491,00 6901,00 4256,00 4505,51 6937,00 5037,00 4407,00 4709,39 4193,79 6122,49 4654,00 4843,57 9982,01 6678,00
4752,00 4233,00 6286,00 6700,00 4323,00 4626,17 6909,00 5063,00 4406,00 4759,09 4295,47 6012,14 4615,00 4864,11 9787,59 6562,00
4809,00 4005,00 6494,00 6510,00 4108,00 4706,91 6964,00 5139,00 4457,00 4726,00 4137,34 6047,34 4696,00 4760,22 9766,84 6602,00
4862,00 4055,00 6536,00 6522,00 4233,84 4678,33 7206,00 5161,00 4469,00 4719,00 4222,24 5940,19 4726,00 4759,00 9706,00 6576,00
4810,00 4153,00 6573,00 6493,00 3974,14 4622,27 7246,00 5199,00 4363,00 4698,00 4246,00 5910,99 4777,00 4733,00 9471,75 6582,00
4778,00 4058,00 6472,00 6430,00 3964,85 4436,11 6949,00 5149,00 4238,00 4679,00 4455,65 5804,14 4729,00 4707,00 9396,57 6541,00
4754,00 3923,00 6241,91 6462,60 4152,49 4535,64 6989,00 5050,00 3990,00 4625,00 4174,98 5736,53 4791,00 4704,00 9363,30 6397,00
4724,00 3885,00 6384,00 6370,00 4052,69 4510,47 6775,56 5057,00 4006,00 4628,00 4171,12 5604,16 4801,00 4586,21 9110,86 6367,00
4674,00 3932,00 6309,52 6475,04 4121,55 4545,12 6724,24 5076,00 4033,00 4632,00 3981,23 5534,71 4905,00 4431,00 9035,28 6307,00
4588,00 3606,00 6273,74 6451,45 4057,38 4464,39 6617,04 5065,00 3794,00 4544,00 4065,77 5518,54 4935,00 4441,71 8899,90 6182,00
4583,00 3882,00 6239,75 6474,53 4137,88 4469,15 6648,38 4995,00 3854,00 4465,00 4012,61 5446,27 4912,00 4336,21 8767,55 6100,00
4576,00 3639,00 6277,90 6608,04 3955,35 4366,75 6644,00 4491,00 3975,00 4364,80 4073,76 5406,26 4908,00 4434,36 8606,02 5959,00
4546,00 3713,00 6151,73 6337,04 4000,76 4338,06 6488,58 4858,00 4058,00 4424,73 4062,40 5657,19 4890,00 4170,89 8637,97 5986,00
4520,00 3607,00 6007,49 6618,06 3973,02 4330,15 6585,00 4935,00 4018,00 4389,59 4246,39 5628,48 4910,00 4151,63 8544,23 5984,00
4494,00 3736,00 6136,03 6629,00 3927,87 4215,41 6537,00 4901,00 3978,00 4417,78 4174,43 5717,87 4934,00 4247,23 8626,63 6098,00
4510,00 4019,00 6102,82 6410,00 3915,27 4215,32 6754,22 4886,00 3984,00 4442,92 3974,74 5830,06 4946,00 4399,45 8622,18 6096,00
4450,00 3536,00 6073,01 6492,00 4045,57 4186,47 6682,36 4884,00 3915,00 4420,13 4231,23 5742,14 4908,00 4315,46 8690,68 5899,00
4443,00 3758,00 5858,06 6657,00 4026,95 4174,16 6644,17 4831,00 3907,00 4405,31 4096,66 5805,92 4933,00 4288,72 8625,78 5980,00
4445,00 3830,00 5872,59 6466,00 3917,61 4147,65 6860,48 4854,00 3892,00 4316,01 4109,18 5693,84 4953,00 4345,76 8635,85 6017,00
4423,00 3685,00 5981,75 6324,04 3928,25 4053,29 7030,00 4844,00 3923,00 4347,68 4141,30 5847,74 4959,00 4352,91 8607,92 5901,00
4463,00 3765,00 5949,31 6187,43 3978,21 4028,79 6977,53 4953,00 3865,00 4301,00 4157,01 5830,81 5010,00 4414,14 8552,52 5812,00
4402,00 3949,00 5993,17 6312,03 3936,15 4066,86 6947,76 4884,00 3872,00 4339,69 4200,09 5408,85 4973,00 4213,56 8565,03 5930,00
4456,00 3774,00 6132,55 6402,07 3909,23 4034,51 6794,80 4893,00 3850,00 4309,58 4232,34 5729,95 4954,00 4235,33 8655,34 5914,00
4488,00 3875,00 6192,65 6406,70 3920,09 4026,07 6827,49 4947,00 3940,00 4373,78 4186,52 5940,92 4916,00 4432,22 8657,87 5951,00
4536,00 3834,00 6319,00 6795,02 3970,73 4004,12 6756,32 4990,00 4009,00 4379,97 4221,20 6040,45 4914,00 4355,76 8720,91 5984,00
4578,00 3909,00 6226,18 6729,56 3988,05 4058,49 6837,00 4992,00 4036,00 4352,94 4365,74 6092,56 4954,00 4396,02 8777,94 6062,00
4568,00 4048,00 6455,36 6653,74 3957,20 4008,01 6910,00 4980,00 4096,00 4383,91 4253,58 6122,68 4953,00 4358,24 8708,57 6165,00
4580,00 4123,00 6565,67 6822,45 3929,43 4091,04 6859,13 5047,00 4149,00 4436,62 4394,75 5954,88 4995,00 4461,35 8815,73 6253,00
4597,00 4195,00 6969,52 7186,27 3944,79 4064,92 6992,49 5073,00 4203,00 4471,26 4521,89 6079,52 5023,00 4469,04 8884,97 6267,00
4597,00 4285,00 7035,25 7204,31 3899,19 4117,17 7204,80 5102,00 4241,00 4449,00 4580,94 6052,79 5121,00 4672,46 9008,71 6293,00
4746,00 4328,00 7345,07 7383,30 4012,85 4189,23 7593,25 5142,00 4314,00 4516,00 4585,99 6349,40 5225,00 4654,28 9235,34 6365,47
4921,00 4460,00 7727,44 7301,35 4314,63 4308,88 7767,10 5139,00 4432,00 4655,01 4787,42 6524,63 5394,00 4752,46 9346,35 6422,75
5059,00 4488,00 8567,19 8005,52 4146,17 4364,07 8006,10 5339,00 4642,00 4728,43 4971,85 6651,53 5414,00 4933,30 9347,00 6485,00
5134,00 4526,00 8739,88 8559,82 4173,06 4364,87 8059,37 5295,00 4711,00 4867,12 5005,95 6762,69 5476,00 4973,70 9611,98 6491,00
5333,00 4684,00 8856,31 8490,24 4734,21 4540,63 8465,92 5507,00 4811,00 5025,52 5191,66 6939,32 5659,00 5176,81 10277,19    6571,00
5718,00 4789,00 9415,16 8668,09 4953,18 4749,88 8697,03 5641,00 5053,00 5252,35 5236,14 6958,78 5851,00 5458,06 10287,43    7339,00
5743,00 4970,00 9344,85 9440,14 4913,98 4759,32 8898,79 5733,00 5116,00 5397,12 5313,16 7073,27 6132,00 5657,47 10575,01    7441,00
5794,12 5252,00 9957,57 9138,80 5617,63 4796,67 8912,86 5965,00 5203,00 5536,09 5325,12 7187,92 6243,00 5786,02 10815,53    7572,00
6063,29 5430,00 9888,55 9078,40 5431,37 5078,87 9109,02 6170,59 5392,00 5674,19 5539,52 7638,71 6552,00 5856,67 11191,65    7720,00
6245,29 5553,00 10562,09    8582,53 5780,84 5446,54 9517,66 6666,33 5465,00 5812,99 5848,66 7809,13 6904,00 6158,59 11656,39    8158,14
6487,81 5851,00 10332,01    8562,56 5820,86 5451,23 9672,44 6835,37 5431,52 6015,25 5934,13 7823,23 7006,49 6217,79 11519,55    8024,05
", dec=",")
data_1 <- data.frame(data_1)
rownames(data_1) <- seq(2006.75,2020.5,by=0.25)

data_1_long <- reshape2::melt(data_1)
colnames(data_1_long) <- c('City','Price')
data_1_long$Date <- rep(seq(2006.75,2020.5,by=0.25),16)
data_1 %>% kbl(row.names=T) %>% kable_paper() %>% scroll_box(width = "900px", height = "400px") 
Białystok Bydgoszcz Gdańsk Gdynia Katowice Kielce Kraków Lublin Łódź Olsztyn Opole Poznań Rzeszów Szczecin Warszawa Wrocław
2006.75 3070.00 2530 4541.00 5756.00 2507.52 2424.92 7114.00 3160.00 2740.00 3414.43 3164.08 3751.81 2851.00 3189.80 7179.00 5260.77
2007 3408.00 2784 5406.00 6496.00 2649.12 3127.42 7383.00 3562.00 3316.00 3925.00 3584.54 4461.66 3734.00 3856.59 8751.00 5856.74
2007.25 3986.00 3607 6115.00 7211.00 3703.64 3903.90 8369.00 3986.00 4130.00 5049.13 3349.43 6104.06 4187.00 4734.75 9316.00 6746.54
2007.5 4418.00 3932 6602.00 6747.00 4152.12 4120.78 8272.00 4699.00 4609.00 5352.28 4149.53 6698.11 4647.00 4958.81 9740.00 7038.09
2007.75 4580.00 4150 6740.00 7188.00 4150.53 4230.50 8255.00 4815.00 4721.00 5394.15 4135.53 6386.66 4814.00 5093.84 10078.00 7193.98
2008 4637.00 3956 6824.00 6917.00 4473.12 4299.54 8140.00 4962.00 4686.00 5169.65 4362.62 6471.78 4849.00 5297.29 9952.00 7265.66
2008.25 4738.00 3685 6795.00 6984.00 4739.42 4528.46 7979.00 5141.00 4737.00 5199.61 4235.14 6235.31 4901.00 5124.49 9850.00 7138.06
2008.5 4853.00 4202 6704.00 6854.00 4638.18 4478.01 7934.00 5071.00 4668.00 5014.25 4068.43 6096.79 4824.00 5179.89 9783.00 7158.82
2008.75 4777.00 4151 6608.00 6960.00 4554.93 4573.55 7180.00 5111.00 4544.00 4964.12 4186.67 5979.68 4790.00 5174.22 9679.00 6921.94
2009 4744.00 4240 6793.00 7059.00 4360.31 4356.19 7251.00 5091.00 4873.00 4946.72 4142.94 5953.70 4640.00 5079.09 10196.00 6859.45
2009.25 4613.00 4095 6648.00 7104.00 4340.42 4467.06 6864.00 5062.00 4730.00 4730.22 4234.59 5978.52 4683.00 4965.16 9626.00 6784.57
2009.5 4614.00 4017 6644.00 6949.00 4288.48 4286.91 6678.00 5017.00 4431.00 4710.13 4282.80 6216.01 4684.00 4860.20 10133.00 6748.86
2009.75 4641.00 4244 6449.00 6901.00 4370.00 4498.77 6564.00 5010.00 4643.00 4733.41 4082.64 5970.06 4641.00 4964.27 9705.24 6841.82
2010 4620.00 4203 6350.00 6991.00 4181.00 4433.91 6734.00 4977.00 4400.00 4719.66 4154.50 5924.45 4681.00 4834.07 9671.30 6757.76
2010.25 4729.00 4202 6465.00 7032.00 4136.00 4604.93 6916.00 5028.00 4558.00 4763.00 4166.15 6073.87 4611.00 4867.64 9901.01 6725.00
2010.5 4709.00 4270 6491.00 6901.00 4256.00 4505.51 6937.00 5037.00 4407.00 4709.39 4193.79 6122.49 4654.00 4843.57 9982.01 6678.00
2010.75 4752.00 4233 6286.00 6700.00 4323.00 4626.17 6909.00 5063.00 4406.00 4759.09 4295.47 6012.14 4615.00 4864.11 9787.59 6562.00
2011 4809.00 4005 6494.00 6510.00 4108.00 4706.91 6964.00 5139.00 4457.00 4726.00 4137.34 6047.34 4696.00 4760.22 9766.84 6602.00
2011.25 4862.00 4055 6536.00 6522.00 4233.84 4678.33 7206.00 5161.00 4469.00 4719.00 4222.24 5940.19 4726.00 4759.00 9706.00 6576.00
2011.5 4810.00 4153 6573.00 6493.00 3974.14 4622.27 7246.00 5199.00 4363.00 4698.00 4246.00 5910.99 4777.00 4733.00 9471.75 6582.00
2011.75 4778.00 4058 6472.00 6430.00 3964.85 4436.11 6949.00 5149.00 4238.00 4679.00 4455.65 5804.14 4729.00 4707.00 9396.57 6541.00
2012 4754.00 3923 6241.91 6462.60 4152.49 4535.64 6989.00 5050.00 3990.00 4625.00 4174.98 5736.53 4791.00 4704.00 9363.30 6397.00
2012.25 4724.00 3885 6384.00 6370.00 4052.69 4510.47 6775.56 5057.00 4006.00 4628.00 4171.12 5604.16 4801.00 4586.21 9110.86 6367.00
2012.5 4674.00 3932 6309.52 6475.04 4121.55 4545.12 6724.24 5076.00 4033.00 4632.00 3981.23 5534.71 4905.00 4431.00 9035.28 6307.00
2012.75 4588.00 3606 6273.74 6451.45 4057.38 4464.39 6617.04 5065.00 3794.00 4544.00 4065.77 5518.54 4935.00 4441.71 8899.90 6182.00
2013 4583.00 3882 6239.75 6474.53 4137.88 4469.15 6648.38 4995.00 3854.00 4465.00 4012.61 5446.27 4912.00 4336.21 8767.55 6100.00
2013.25 4576.00 3639 6277.90 6608.04 3955.35 4366.75 6644.00 4491.00 3975.00 4364.80 4073.76 5406.26 4908.00 4434.36 8606.02 5959.00
2013.5 4546.00 3713 6151.73 6337.04 4000.76 4338.06 6488.58 4858.00 4058.00 4424.73 4062.40 5657.19 4890.00 4170.89 8637.97 5986.00
2013.75 4520.00 3607 6007.49 6618.06 3973.02 4330.15 6585.00 4935.00 4018.00 4389.59 4246.39 5628.48 4910.00 4151.63 8544.23 5984.00
2014 4494.00 3736 6136.03 6629.00 3927.87 4215.41 6537.00 4901.00 3978.00 4417.78 4174.43 5717.87 4934.00 4247.23 8626.63 6098.00
2014.25 4510.00 4019 6102.82 6410.00 3915.27 4215.32 6754.22 4886.00 3984.00 4442.92 3974.74 5830.06 4946.00 4399.45 8622.18 6096.00
2014.5 4450.00 3536 6073.01 6492.00 4045.57 4186.47 6682.36 4884.00 3915.00 4420.13 4231.23 5742.14 4908.00 4315.46 8690.68 5899.00
2014.75 4443.00 3758 5858.06 6657.00 4026.95 4174.16 6644.17 4831.00 3907.00 4405.31 4096.66 5805.92 4933.00 4288.72 8625.78 5980.00
2015 4445.00 3830 5872.59 6466.00 3917.61 4147.65 6860.48 4854.00 3892.00 4316.01 4109.18 5693.84 4953.00 4345.76 8635.85 6017.00
2015.25 4423.00 3685 5981.75 6324.04 3928.25 4053.29 7030.00 4844.00 3923.00 4347.68 4141.30 5847.74 4959.00 4352.91 8607.92 5901.00
2015.5 4463.00 3765 5949.31 6187.43 3978.21 4028.79 6977.53 4953.00 3865.00 4301.00 4157.01 5830.81 5010.00 4414.14 8552.52 5812.00
2015.75 4402.00 3949 5993.17 6312.03 3936.15 4066.86 6947.76 4884.00 3872.00 4339.69 4200.09 5408.85 4973.00 4213.56 8565.03 5930.00
2016 4456.00 3774 6132.55 6402.07 3909.23 4034.51 6794.80 4893.00 3850.00 4309.58 4232.34 5729.95 4954.00 4235.33 8655.34 5914.00
2016.25 4488.00 3875 6192.65 6406.70 3920.09 4026.07 6827.49 4947.00 3940.00 4373.78 4186.52 5940.92 4916.00 4432.22 8657.87 5951.00
2016.5 4536.00 3834 6319.00 6795.02 3970.73 4004.12 6756.32 4990.00 4009.00 4379.97 4221.20 6040.45 4914.00 4355.76 8720.91 5984.00
2016.75 4578.00 3909 6226.18 6729.56 3988.05 4058.49 6837.00 4992.00 4036.00 4352.94 4365.74 6092.56 4954.00 4396.02 8777.94 6062.00
2017 4568.00 4048 6455.36 6653.74 3957.20 4008.01 6910.00 4980.00 4096.00 4383.91 4253.58 6122.68 4953.00 4358.24 8708.57 6165.00
2017.25 4580.00 4123 6565.67 6822.45 3929.43 4091.04 6859.13 5047.00 4149.00 4436.62 4394.75 5954.88 4995.00 4461.35 8815.73 6253.00
2017.5 4597.00 4195 6969.52 7186.27 3944.79 4064.92 6992.49 5073.00 4203.00 4471.26 4521.89 6079.52 5023.00 4469.04 8884.97 6267.00
2017.75 4597.00 4285 7035.25 7204.31 3899.19 4117.17 7204.80 5102.00 4241.00 4449.00 4580.94 6052.79 5121.00 4672.46 9008.71 6293.00
2018 4746.00 4328 7345.07 7383.30 4012.85 4189.23 7593.25 5142.00 4314.00 4516.00 4585.99 6349.40 5225.00 4654.28 9235.34 6365.47
2018.25 4921.00 4460 7727.44 7301.35 4314.63 4308.88 7767.10 5139.00 4432.00 4655.01 4787.42 6524.63 5394.00 4752.46 9346.35 6422.75
2018.5 5059.00 4488 8567.19 8005.52 4146.17 4364.07 8006.10 5339.00 4642.00 4728.43 4971.85 6651.53 5414.00 4933.30 9347.00 6485.00
2018.75 5134.00 4526 8739.88 8559.82 4173.06 4364.87 8059.37 5295.00 4711.00 4867.12 5005.95 6762.69 5476.00 4973.70 9611.98 6491.00
2019 5333.00 4684 8856.31 8490.24 4734.21 4540.63 8465.92 5507.00 4811.00 5025.52 5191.66 6939.32 5659.00 5176.81 10277.19 6571.00
2019.25 5718.00 4789 9415.16 8668.09 4953.18 4749.88 8697.03 5641.00 5053.00 5252.35 5236.14 6958.78 5851.00 5458.06 10287.43 7339.00
2019.5 5743.00 4970 9344.85 9440.14 4913.98 4759.32 8898.79 5733.00 5116.00 5397.12 5313.16 7073.27 6132.00 5657.47 10575.01 7441.00
2019.75 5794.12 5252 9957.57 9138.80 5617.63 4796.67 8912.86 5965.00 5203.00 5536.09 5325.12 7187.92 6243.00 5786.02 10815.53 7572.00
2020 6063.29 5430 9888.55 9078.40 5431.37 5078.87 9109.02 6170.59 5392.00 5674.19 5539.52 7638.71 6552.00 5856.67 11191.65 7720.00
2020.25 6245.29 5553 10562.09 8582.53 5780.84 5446.54 9517.66 6666.33 5465.00 5812.99 5848.66 7809.13 6904.00 6158.59 11656.39 8158.14
2020.5 6487.81 5851 10332.01 8562.56 5820.86 5451.23 9672.44 6835.37 5431.52 6015.25 5934.13 7823.23 7006.49 6217.79 11519.55 8024.05

As we can see data is from the third quarter of 2006 year to second quarter of 2020 year and contain 16 biggest polish cities. Let`s see some basic statistics.

summary(data_1) %>% kable()%>%
  kable_material(c("striped", "hover"))  %>% scroll_box(width = "900px", height = "400px") 
Białystok Bydgoszcz Gdańsk Gdynia Katowice Kielce Kraków Lublin Łódź Olsztyn Opole Poznań Rzeszów Szczecin Warszawa Wrocław
Min. :3070 Min. :2530 Min. : 4541 Min. :5756 Min. :2508 Min. :2425 Min. :6489 Min. :3160 Min. :2740 Min. :3414 Min. :3164 Min. :3752 Min. :2851 Min. :3190 Min. : 7179 Min. :5261
1st Qu.:4518 1st Qu.:3816 1st Qu.: 6182 1st Qu.:6475 1st Qu.:3957 1st Qu.:4120 1st Qu.:6771 1st Qu.:4899 1st Qu.:3982 1st Qu.:4415 1st Qu.:4137 1st Qu.:5741 1st Qu.:4728 1st Qu.:4387 1st Qu.: 8704 1st Qu.:6051
Median :4617 Median :4034 Median : 6468 Median :6771 Median :4115 Median :4360 Median :6971 Median :5042 Median :4278 Median :4667 Median :4211 Median :5979 Median :4911 Median :4720 Median : 9347 Median :6454
Mean :4739 Mean :4100 Mean : 6875 Mean :7025 Mean :4208 Mean :4340 Mean :7351 Mean :5062 Mean :4327 Mean :4716 Mean :4361 Mean :6076 Mean :5002 Mean :4744 Mean : 9387 Mean :6524
3rd Qu.:4786 3rd Qu.:4241 3rd Qu.: 6802 3rd Qu.:7187 3rd Qu.:4327 3rd Qu.:4537 3rd Qu.:7945 3rd Qu.:5140 3rd Qu.:4649 3rd Qu.:4951 3rd Qu.:4410 3rd Qu.:6264 3rd Qu.:4999 3rd Qu.:4967 3rd Qu.: 9803 3rd Qu.:6799
Max. :6488 Max. :5851 Max. :10562 Max. :9440 Max. :5821 Max. :5451 Max. :9672 Max. :6835 Max. :5465 Max. :6015 Max. :5934 Max. :7823 Max. :7006 Max. :6218 Max. :11656 Max. :8158

Looking only at those simple statistics we can see that prices vary a lot between cities. There are cities that average price in 2020 was lower than almost 14 years ago in Warszawa or Kraków. This is a good sing for clustering analysis. Let`s visualise those prices now.

myplots = lapply(data_1, function(col)
  ggplot(data_1) + geom_line(aes(y=col,x=seq(2006.75,2020.5,by=0.25)),size=2,color="brown3") + 
    coord_cartesian(ylim=c(0,12000)) + labs( x ="Time", y = "Price in PLN") + theme_wsj() + theme(axis.title=element_text(size=12)) )

k=1
for(i in myplots){
  cat('###',names(myplots)[k],'<br>',' \n')
  print(i)
  cat('\n', '<br>', '\n\n')
  k=k+1
}

Białystok


Bydgoszcz


Gdańsk


Gdynia


Katowice


Kielce


Kraków


Lublin


Łódź


Olsztyn


Opole


Poznań


Rzeszów


Szczecin


Warszawa


Wrocław


Inspecting run charts we can see that prices in cities are on different levels. Although their dynamics look very similar in almost every one of them. There were high growth at the beginning, followed by some stagnation and rapid growth at the end.
Let`s make boxplot and run chart with all cities at once, maybe we will be able to spot the number of clusters by naked eye.

Boxplot

data_1_long %>% ggplot() + geom_boxplot(aes(y=Price,x=reorder(City,Price),color=City)) + theme_bw() +
  theme(legend.position = "none") + coord_flip() + labs(x= "City",y='Price')

Run chart

ggplotly(data_1_long %>% ggplot() + geom_line(aes(y=Price,x=Date,color=City)) + theme_bw())

Inspecting both plots we can clearly distinguish 3 clusters. First one consist only Warszawa, second Kraków, Gdynia, Gdańsk, Wrocław and Poznan, and third cluster will include all other cities. This is exactly how NBP state that this data should be clustered.  Let`s run clustering algorithms now. First we run hopkins test to confirm that data is clusterable and then we will check optimal number of clusters using 3 statistics: slhouette, wss and gap statistics.

Clustering

get_clust_tendency(t(data_1),15,graph=FALSE)
## $hopkins_stat
## [1] 0.7599558
## 
## $plot
## NULL

Hopkins statistic is equal to 0.76. Hopking value of 0.5 means that data is random. When it it close to 1 we can assume that data is highly clusterable and clusters are visible. In our case hopkins value is in the middle between 0.5 and 1. We will state that data is clusterable, although silhouette and other statistics that measure quality of clustering will probably not be the highest.

for(i in c("silhouette", "wss" ,"gap_stat")){
  cat('###',i,'<br>',' \n')
  print(fviz_nbclust(t(data_1), FUNcluster = kmeans, method = i))
  cat('\n', '<br>', '\n\n')
}

silhouette


wss


gap_stat


When we check optimal number of clusters using three statistics only silhouette and wss are agreeing with each other. They state that optimal number of clusters are two but the silhouette value for three clusters is almost as good as for two. Now we will use the same function but with four different algorithms and same statistic - silhouette.

k=1
vec <- c("kmeans","pam","clara","hcut")
for(i in c(kmeans,pam,clara,hcut)){
  cat('###',vec[k],'<br>',' \n')
  print(fviz_nbclust(t(data_1), FUNcluster = i, method = "silhouette"))
  cat('\n', '<br>', '\n\n')
  k=k+1
}

kmeans


pam


clara


hcut


We see that the charts of optimal numbers are almost the same. Two or three clusters should be chosen. I am going to be in line with NBP and I will choose 3 clusters. Let`s visualise them.

kmeans

fviz_cluster(eclust(t(data_1),FUNcluster = "kmeans" ,k = 3,graph = F))

pam

fviz_cluster(eclust(t(data_1),FUNcluster = "pam" ,k = 3,graph = F))

clara

fviz_cluster(eclust(t(data_1),FUNcluster = "clara" ,k = 3,graph = F))

hclust

fviz_cluster(eclust(t(data_1),FUNcluster = "hclust" ,k = 3))

All the algorithms provided the same clustering. This is great news. It looks like choosing three clusters was good option and does not matter which algorithm to cluster we will use. They do not have problems with clustering the data. Let`s measure quality of clustering using silhouette plot.

fviz_silhouette(eclust(t(data_1),"kmeans", hc_metric="euclidean",k=3,graph=FALSE))
##   cluster size ave.sil.width
## 1       1    1          0.00
## 2       2   10          0.77
## 3       3    5          0.57

Silhouette width is quite high 0.66. This only assure us that performed analysis was proper. At the end I will repeat two plots made at the beginning but they will be colored according to clusters assignment.

c2 <- eclust(t(data_1),FUNcluster = "kmeans" ,k = 3,graph = FALSE)
data_1_long <- merge(data_1_long,data.frame('nm' = names(c2$cluster),'cl' = c2$cluster),by.x='City',by.y='nm')

Boxplot

data_1_long %>% ggplot() + geom_boxplot(aes(y=Price,x=reorder(City,Price),color=cl)) + theme_bw() +
  theme(legend.position = "none") + coord_flip() + labs(x= "City",y='Price')

Run chart

ggplotly(data_1_long %>% ggplot() + geom_line(aes(y=Price,x=Date,col=cl)) + theme_bw())

Summary

To sum up, goal of this analysis was to check whether we can cluster the cities according to their apartment prices. The results showed that the chosen data could be easily clustered using basic algorithms like kmeans, pam or hierarchical clustering. Clusters were clearly visible. Results showed that the NBP was right in choosing the number of clusters and their composition. Research depicted also how much information we can obtain by performing exploratory data analysis. Looking just at descriptive statistics and ploting the data, gave us important directions of how many clusters should be chosen and even how the clusters should look. Clustering one dimensional data should be always preceded by EDA that will help us to better understand the data and can improve further analysis.